Porting the LAMMPS/EAM benchmark to OpenCL TM with limited code modifications

نویسنده

  • David A. Richie
چکیده

Introduction. Demonstrating the use of GPUs for technical computing normally focuses on the use of GPU-tuned algorithms for accelerating applications over the baseline performance on a CPU. Although speedups of as much as 100x or greater are sometimes reported, more realistic speedups are in the range of 2x to 10x when careful consideration and effort is given to comparable optimizations for the CPU. The acceleration of GPU-optimized code often neglects the realities of existing production HPC codes in terms of acceptable modifications, where absolute performance is only one of many factors considered in software design decisions. This white paper discusses the initial results from an investigation of a pre-defined benchmark for a production molecular dynamics code (LAMMPS) using OpenCL

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing molecular dynamics on hybrid high performance computers - short range forces

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid highperformance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages...

متن کامل

High-Level Manipulation of OpenCL-Based Subvectors and Submatrices

High-level C++ proxies for the convenient manipulation of subvectors and submatrices on OpenCL-enabled devices are introduced. It is demonstrated that the programming convenience of standard host-based code can be retained using native C++ language features only, even if massively parallel computing architectures such as graphics processing units are employed. The required modifications of the ...

متن کامل

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the cod...

متن کامل

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited ...

متن کامل

Swan: A tool for porting CUDA programs to OpenCL

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011